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Languages 

and 

Automata 



Formal languages are very important in CS 
Especially in programming languages 

• Regular languages 

The weakest formal languages widely used 
Many applications 

We will also study context-free languages, tree 
languages 
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Beyond 

Regular 

Languages 



Many languages are not regular 



Strings of balanced parentheses are not regular: 

{(' )■' I i>0} 
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What Can 
Regular 
Languages 
Express? 



Languages requiring counting modulo a fixed integer 



Intuition: A finite automaton that runs long enough 
must repeat states 



Finite automaton can't remember # of times it has 
visited a particular state 
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Input: sequence of tokens from lexer 



The 

Functionality 
of the Parser 



Output: parse tree of the program 
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Example 



Cool 

if x = y then 1 else 2 fi 
• Parser input 

IF ID = ID THEN INT ELSE INT FI 



Parser output 



IF-THEN-ELSE 
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Comparison 
with Lexical 
Analysis 



Phase 


Input 


Output 


Lexer 


String of 
characters 


String of 
tokens 


Parser 


String of 
tokens 


Parse tree 
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The Role of 
the Parser 



Not all strings of tokens are valid programs . . . 

. . . parser must distinguish between valid and invalid 
strings of tokens 

We need 

A language for describing valid strings of tokens 

A method for distinguishing valid from invalid 
strings of tokens 
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Context-Free 

Grammars 



Programming language constructs have recursive 
structure 

Expressions are themselves recursively composed of 
other expressions. 

An EXPR is 

if EXPR then EXPR else EXPR fi 
while EXPR loop EXPR pool 



Context-free grammars are a natural notation for 
describing this recursive structure 
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CFGs 

(Cont.) 



• A CFG consists of 

A set of terminals alphabet of the language 
A set of non-terminals N 
A start symbol (anon-terminal) 

A set of productions 

x -^YY -7 

A ' 1 l 1 2 1 n 

where X eN and Y i eT u N 
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Notational 

Conventions 



Notes 

• Non-terminals are written upper-case 

• Terminals are written lower-case 

The start symbol is the left-hand side of the first 
production 
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Examples of 
CFGs 




EXPR -> 

1 

1 


if EXPR then EXPR else EXPR fi 
while EXPR loop EXPR pool 
id 
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Simple arithmetic expressions: 



E 

Examples of 
CFGs (cont.) 



— » E * E 
I E + E 
I (E) 

I id 
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Read productions as rules: 



The 

Language of 
a CFG 




Y ...y 

1 \ ± n 



Means 




can be replaced by 



Y ... Y 

1 n 
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Idea 




Begin with a string consisting of the start symbol U S" 

Replace any non-terminal X in the string by the 
right-hand side of some production 




y ...y 

-*1 1 n 



d Repeat ( 2 ) until there are no non-terminals in the 
string 
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The 

Lanquaqe 
of a CFG 
(Cont.) 



More formally, write 



X.-X'-X.-tX.-X, .Z-TX-I 

Yin I i-Y I m i+Y 



n 



if there is a production 



X -+Y -Y 

^ 1 \ 1 m 
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The 

Lanquaqe 
of a CFG 
(Cont.) 



Write 



I-I4K-7 

1 n 1 m 



If in a number of steps>=o 




Y - F 

1 m 
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The 

Lanquaqe 
of a CFG 



Let G be a context-free grammar with start symbol S. 

Then the language of G is: 




a 



n 



I S ^a l ...a n and every a. is a terminal j 



S goes in zero or more steps to a string of terminals 
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Terminals 



Terminals are so-called because there are no rules for 
replacing them 



Once generated, terminals are permanent 



Terminals must be tokens of the language 
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L(G) is the language of CFG G 



Examples 



Strings of balanced parentheses 

{(')' l/>0} 

Two grammars: 




( 5 ) 

s 
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Cool 

Example 


A fragment of COOL: 


EXPR -» if EXPR then EXPR else EXPR fi 

1 while EXPR loop EXPR pool 

1 id 
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Cool 

Example 

(Cont.) 



Some elements of the language 



id 

if id then id else id fi 

while id loop id pool 

if while id loop id pool then id else id 

if if id then id else id fi then id else id fi 
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Arithmetic 

Example 



Simple arithmetic expressions: 

E — » E+E I E * E I (E) I id 

Some elements of the language: 



id 

(id) 

(id) * id 



id + id 
id * id 
id * (id) 
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Which of the strings are in the 
language of the given CFG? 



Quiz 



□ 

□ 

□ 

□ 



abcba 


S 


-> aXa 


acca 


X 


-> 8 

bY 


aba 


Y 


-> 8 


abcbcba 




| cXc 
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The idea of a CFG is a big step. But, we still need some 
otherthings. 



Membership in a language is "y es " or "no"; also need 
parse tree of the input 

Notes 

• Must handle errors 



Need an implementation of CFG's (e.g., bison) 
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More 

Notes 



Form of the grammar is important 

Many grammars generate the same language 
Tools are sensitive to the grammar 

Note: Tools for regular languages (e.g., flex) are sensitive 
to the form of the regular expression, but this is rarely a 
problem in practice 
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Derivations 
and Parse 
Trees 



A derivation is a sequence of productions 

s — > > > — 

A derivation can be drawn as a tree 
Start symbol is the tree's root 

For a production X — > Y x * • • Y 
Y 1 - • - Y n to node X 



add children 
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• Grammar 

E — » E+E I E * E I (E) I id 

Derivation 

Example 

id * id + id 
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Derivation 

Example 

(Cont.) 



E 

E+E 
E*E+E 
id * E + E 
id*id + E 
id * id + id 
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Derivation in 
Detail (1) 



E 



E 
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Derivation 
in Detail (2) E 

-> E+E 



E 
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-> 



Derivation 
in Detail (3) 



E 

E+E 

E*E+E 
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Derivation 
in Detail (4) 



E 

E+E 
E*E+E 
id * E + E 



E 
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E 

Derivation E+E 

in Detail (5) e*E+E 

— » id * E + E 
— > id * id + E 



Derivation 
in Detail (6) 

-> 

-> 



E 

E+E 
E*E+E 
id * E + E 
id * id + E 
id * id + id 



+ 



E 
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Notes on 
Derivations 



A parse tree has 

Terminals at the leaves 
Non-terminals at the interior nodes 

An in-order traversal of the leaves is the original input 

The parse tree shows the association of operations, the 
input string does not 
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Left-most 
and Right- 
most 

Derivations 



The example is a left-most 
derivation 

• At each step, replace the 
left-most non-terminal 






There is an equivalent notion ' 
of a right-most derivation 

-» 



E 

E+E 
E+id 
E * E + id 
E*id + id 
id * id + id 
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Right-most 
Derivation 
in Detail (1) E 



E 
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Right-most 
Derivation in 
Detail (2) 



E 
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Right-most 
Derivation — » 

in Detail (3) 



E 

E+E 

E+id 



E 




id 
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Right-most 
Derivation 
in Detail (4) 




E * E + id 
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Right-most 
Derivation in 
Detail (5) 



E 

E+E 
E+id 
E * E + id 
E*id + id 



+ 



E 
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Right-most 
Derivation 
in Detail (6) 



E 

E+E 
E+id 
E * E + id 
E * id + id 
id * id + id 
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Derivations 
and Parse 
Trees 



Note that right-most and left-most derivations have 
the same parse tree 



The difference is the order in which branches are added 
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Quiz 



Which of the following is a valid 
derivation of the given grammar? 


S - \ aXa 


s 




X e | bY 


aXa 

0 abYa 


O L 


4, 

CO 

<~> 

X 

ro 


acXca 






acca 






S 

aXa 

abYa 

0 abcXca 


s 

aXa 
q abYa 
abcXcda 
abccda 




abcbYca 




abcbdca 





Dr. Sherin ElGokhy 



Quiz 



Which of the following is a valid 
parse tree for the given grammar? 



S aXa 





s 




£ c X c d 



X ->e bY 




— > e cXc I d 




s 




a 




c 



c 



£ 
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Summary of 
Derivations 



We are not just interested in whether s s L(G) 
We need a parse tree for s 



A derivation defines a parse tree 

But one parse tree may have many derivations 



Left-most and right-most derivations are important in 
parser implementation 
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• Grammar 

E — » E+E I E * E I (E) I id 

Ambiguity Strm9 

id * id + id 
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This string has two different parse trees for the same string 



Ambiguity 

(Cont.) 



E 




E * E id 
id id 
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Ambiguity 

(Cont.) 



A grammar is ambiguous if it has more than one parse 
tree for some string distinct parse trees 

Equivalently, there is more than one right-most or left- 
most derivation for some string 

• Ambiguity is BAD 

Leaves meaning of some programs ill-defined 
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Which of the following grammars are ambiguous? 



Quiz 



□ S — > SS | a | b 

□ E — > E + E | id 

□ S — > Sa | Sb 

□ E->E' | E' + E 
E'-> -E' | id | (E) 
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